2 296 840 new breast cancer patients in 20221.
Aim of project: Exploring and analyzing patterns in breast cancer gene expression data.
The analysis was performed on the dataset “GDC TCGA Breast Cancer (BRCA)” from xenabrowser.net
Our data:
Notes: Materials: What data did you use and where did you get it from?
Data obtained programatically
Pivoted the gene expression dataset longer to be more tidy
The two datasets were joined on the patient IDs
Mutated the dataset to add new columns:
Analytical methods:
Notes: Methods: Which modelling did you use? Think of the methods section as a recipe for how to go from raw to results => Flow chart?
Show flowchart here!!!
Show Jonas plot here
Catching the cancer in an early stage seems to increase chance of survival
Limitations and future work
R4BDS - Group 28